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Sensory neuroscience seeks to understand how the brain encodes natural environments. However, 
neural coding has largely been studied using simplified stimuli. In order to assess whether the 
brain's coding strategy depend on the stimulus ensemble, we apply a new information-theoretic 
method that allows unbiased calculation of neural filters (receptive fields) from responses to natural 
scenes or other complex signals with strong multipoint correlations. In the cat primary visual 
cortex we compare responses to natural inputs with those to noise inputs matched for luminance 
and contrast. We find that neural filters adaptively change with the input ensemble so as to increase 
the information carried by the neural response about the filtered stimulus. Adaptation affects the 
spatial frequency composition of the filter, enhancing sensitivity to under-represented frequencies in 
agreement with optimal encoding arguments. Adaptation occurs over 40 s to many minutes, longer 
than most previously reported forms of adaptation. 



The neural circuits in the brain that underlie our be- 
havior are well suited for processing of real-world - or 
natural - stimuli. These neural circuits, especially at 
the higher stages of neural processing, may be largely or 
completely unresponsive to many artificial stimulus sets 
used to analyze the early stages of sensory processing 
and, more generally, for systems analysis. Thus, natu- 
ral stimuli may be necessary to study higher-level neu- 
rons. Characterizing neural responses to natural stim- 
uli at early or intermediate stages of neural process- 
ing, such as the primary visual cortex, is a necessary 
step for systematic studies of higher-level neurons. Neu- 
ral responses are also known to be highly nonlineari^i^ 
and adaptive^'^i'^^'iaaii2^^^a6^a8^^^ ^^^^king 

them difficult to predict across different stimulus sets^i. 
Therefore, even early in visual processing, characteriza- 
tions based on simplified stimuli may not be adequate to 
understand responses to the natural environment. 

For these reasons there has been a great deal of interest 
in studying neural responses to complex, natural stimuli 
(for example, see ref p-^'^^i^^'^'^i^^'^^i^^ ). However, the rela- 
tionship between coding of natural and laboratory stim- 
uli remains elusive due to the difficulty of characterizing 
neurons ~ assessing their receptive fields - from responses 
to natural stimuli, as we now describe. 

A simple and commonly-used model of neural re- 
sponses is the linear-nonlinear modeP'^ . In this model, 
the response of the neuron depends on linear filtering of 
the stimulus luminance values S by a receptive field L 
defined over some region of space and time. Mathemati- 
cally, the filter output at time t is a sum over the spatial 
positions {x,y) and temporal delays t' to which the neu- 
ron's response is sensitive: J2x y f'^i^^ y,t^ t')S{x, y, t'), 
which we abbreviate as L*S. The output of this filter is 
then passed through a nonlinear function / to yield the 
neuron's response r: r(t) = /(L*S). The nonlinearity in- 



corporates the fact that the firing rate cannot be negative 
and other aspects of neural response such as threshold, 
saturation, and sensitivity or insensitivity to changes in 
stimulus polarity. We will use the terms neural filter or 
receptive field throughout this paper to mean the linear 
part L of the linear-nonlinear model. 

Traditionally, neural receptive fields have been es- 
timated as the spike-triggered average stimulus (STA; 
with appropriate correction for autocorrelation of the 
inputs)iiS3^i25^i28 or by related methodsia^i29^ ^hese 
methods give unbiased results for linear systems for any 
stimulus ensemble or for nonlinear systems if the ensem- 
ble is Gaussian random noise. However, they produce 
systematic deviations from the true filter of nonlinear 
"linear-nonlinear" neurons probed with natural stimuli 
(or other non-Gaussian stimuli), even in situations where 
the only nonlinearity is due to a conversion of the out- 
put of a linear receptive field to firing rate^ii^S. This 
happens because natural stimuli, unlike Gaussian stim- 
uli which may be completely described by pairwise cor- 
relations, have strong higher-order as well as pairwise 
correlation o^^i'^^'^'^ . The higher-order correlations may be 
viewed as what distinguishes natural from random Gaus- 
sian stimuli. The bias in the filter estimate calculated 
using the Gaussian or linear assumption increases with 
the strength of the nonlinearity and with the strength of 
stimulus correlations beyond second order— i2P, not van- 
ishing even with infinite data. 

Recently an information-theoretic method has been de- 
veloped that correctly estimates receptive fields of non- 
linear model neurons (with extensions to multiple lin- 
ear filters) for arbitrary stimulus ensembles regardless 
of the strength of multi-point correlations, even in cases 
where the STA is zerc"^". According to this method, one 
searches for the spatiotemporal filter L whose output, 
L*S, carries the most mutual information with the exper- 
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imentally measured neuronal response r{t). In practice, 
this is done via a gradient ascent procedure, searching in 
the space of all possible spatiotemporal receptive fields 
or filters to find the most informative one (referred to 
as "the most informative dimension", or MID). We can 
then calculate the nonlinearity associated with the MID 
from the data as the probability of a spike given the filter 
output; there is no need to make any assumption about 
the shape of the nonlinearity. 

Similarly to other "spike-triggered" methods, the MID 
method compares two probability distributions of out- 
puts for a given filter: the distribution of outputs that 
occur before (or trigger) a spike, and the distribution of 
outputs over the entire stimulus ensemble regardless of 
neural response. If a filter represents a stimulus feature 
that affects neural responses, then certain values of its 
output will be more probable before a spike, and so the 
two distributions should differ from one another. The 
various methods all seek filters that maximize the dif- 
ference between the two distributions, but differ in the 
measure of this difference. For the STA, the measure is 
the change in the mean of the two distributions; for the 
spike-triggered covariance methodiSi^i^a, it is the change 
in the variance; and for the MID, it is an information- 
theoretic measure (the KuUback-Leibler distance) that 
corresponds to the mutual information between the filter 
output and the spikes. The information-theoretic mea- 
sure is more general than the mean or variance, because 
it is sensitive to correlations of all orders, which in part 
explains the success of the MID method in estimating 
neural filters from responses to natural stimuli. Here we 
apply this method for the first time to neural data, fo- 
cusing on the single-filter model, to address the question 
of whether and how VI receptive fields adapt to natural 
stimuli. 



I. RECEPTIVE FIELDS FROM NOISE VS. 
NATURAL SCENES 

We studied 40 simple cells (as characterized by re- 
sponses to optimal moving gratings'^^) in anesthetized 
cat VI (complex cells can also be characterized by the 
MID method^ and will be considered in a future pub- 
lication). We probed these neurons with natural and 
white noise inputs. These inputs differ in two impor- 
tant respects. First, they have very different pairwise 
correlations, which are described by the power spectra. 
The power spectrum of a white noise ensemble does not 
depend on either spatial or temporal frequency within a 
certain range, while the power spectrum of natural inputs 
depends on spatial frequency fc as ~ under a wide 
variety of conditions^i2^i^i2& (spatiotemporal statistics 
have similar structure^^) . Second, natural scenes have 
strong statistical correlations beyond second order that 
cannot be described by the power spectrum, as evident 
for example in the much greater incidence of oriented 
edges in natural scenes than in Gaussian noise with the 



same power spectrum^i^. 

To estimate spatiotemporal receptive fields or neural 
filters from responses to noise and natural stimuli, we ap- 
plied both the linear systems and information-theoretic 
methods. The resulting estimated filters and STAs for 
two example cells are shown in Fig. 1. With respect to 
responses to the noise ensemble, we found the filter for 
each cell either as the traditional STA or as the MID'*-. 
As expected for white noise stimuli, the two estimates 
do not differ significantly from each other for the illus- 
trated cells or for most cells {p > 0.05 for 31 out of 40 
cells; t-test, see Supplementary Methods); the remain- 
ing differences can be attributed to the residual spatial 
correlations in the white noise ensemble (cf. Fig. 3b). 
This agreement illustrates the basic validity of the MID 
method under circumstances where the STA offers an in- 
dependent unbiased estimate. 

For responses to the natural stimulus ensemble, we cal- 
culated the STA and corrected it for second-order cor- 
relations present in the natural ensemble to obtain a 
decorrelated STA (dSTA). This would describe the neu- 
ron's filter if the neuron were linear. Because this pro- 
cedure of correcting for stimulus correlations tends to 
amplify noise, we also calculated the dSTA using regu- 
larization to prevent such amplification - such decorrela- 
tion with regularization has been used in most previous 
work estimating neural filters from responses to natural 
signals^^i^i2^i^. Finally, we estimated the filter from 
natural inputs as the MID. As can be seen in Fig. 1, the 
MID produces an estimate of the filter for natural scenes 
that is much closer to the white noise filter than either the 
dSTA or the regularized dSTA. Across cells, the dSTA 
shows a greater difference from the white noise filter than 
does the natural ensemble MID, as judged by smaller cor- 
relation coefficients with either the noise ensemble STA 
or noise ensemble MID (40/40 cells, p < 10~^). This 
demonstrates that some of the differences between the 
neural filters obtained from natural and noise stimulation 
in the linear model are due to biases in the estimation 
of the natural filter that can be removed once the linear- 
nonlinear model is considered and the MID is computed. 
In Fig. 1 , we also plot the nonlinear functions that show 
spike probability as a function of filter output. They are 
similar in shape for the MIDs of the two ensembles, and 
this behavior seems to be typical across cells. 

We used the MIDs to estimate both the noise and nat- 
ural filters in what follows. We studied all simple cells 
with a non-zero filter to both natural and noise inputs. 

Despite the similarity of the filters obtained under the 
two conditions, cf. Fig. 1, a jackknife analysis of the 
errors in estimating the neural filters shows that the dif- 
ferences between the filters derived from noise and nat- 
ural signals are statistically significant {p < 0.01) for all 
cells. To investigate the source of these differences and to 
make connections with classic studies on neural responses 
to moving periodic patterns (gratings) of certain orienta- 
tions and spatial frequencies, we compute the spatiotem- 
poral Fourier transform of the filter in the two spatial 
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FIG. 1: Filters and nonlinearities for two simple cells. Top to bottom: STA and MID for noise ensemble; STA, dSTA, 
dSTA with regularization, and MID for natural ensemble. Spatiotemporal receptive fields have three time frames covering the 
indicated interval (-133 to -33 ms). In the right-most column for each filter we plot the probability distribution of filter outputs 
in the stimulus ensemble (magenta) and the spike probability given the filter output (blue; values of the y axis refer to these 
probabilities). The color scale shows the filter in units of its average noise level (see Supplementary Methods), x-y scale bars: 
1°. Error bars show standard errors of the mean in all figures. 



dimensions and time. The position of the maximum of 
the Fourier transform at the grating temporal frequency 
is our prediction for the optimal grating orientation and 
spatial frequency for a particular neuron. We did not de- 
tect any systematic shifts in optimal orientation and only 
a small shift in optimal spatial frequency as assayed from 
noise filters, natural signals filters and grating stimuli, in 
agreement with previous findings using the regularized 
dST A^^i^^ , see Supplementary Discussion. 

The most marked differences between the neural filters 
derived from natural vs. noise stimulation are seen by 
considering the entire shape of the spatial frequency tun- 
ing curves (Fig. 2 and Supplementary Figs 1 and 2) and 
not just the location of the single best spatial frequency. 
For each cell and temporal frequency, we calculated the 
spatial frequency profile along the cell's preferred stim- 
ulus orientation using interpolation of the filter's two- 
dimensional discrete Fourier transform. Note that our 
temporal resolution allowed analysis only at two tempo- 



ral frequencies (0 Hz and 10 Hz, in each of two opposite 
directions of motion). Results at lOHz did not depend 
on direction of motion so both directions were combined 
in Fig. 2, which shows the average tuning of the cells 
in our dataset. For low spatial frequencies sensitivity 
decreased (increased) to common (rare) inputs, while at 
middle and high spatial frequencies the sensitivity did 
not change. For example, at zero temporal frequency, 
low spatial frequencies are more common in the natu- 
ral than in the white noise stimulus ensemble (Fig. 2b). 
Correspondingly, neurons became less sensitive to those 
frequencies during stimulation with natural inputs than 
during stimulation with noise inputs (Fig. 2a). In the 
case of non-zero temporal frequencies the trend is re- 
versed, because the noise stimulus ensemble has more 
power at nearly all spatial frequencies than the natural 
stimulus ensemble (Fig. 2d, e). These changes in filter 
can be observed in the majority of cells, and are not sim- 
ply due to adaptation in a small subset of cells. This is 
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FIG. 2: Neural filters compensate for changes in the input power spectrum. Average amplitude spectra of neural 
filters (a,d) and input ensembles (b,e) corresponding to natural (blue circles) and white noise (red circles) stimulation for 
temporal frequencies of and 10 Hz. The spectra were taken along the optimal orientation for each cell by interpolating the 
discrete 2D Fourier transform. We use filled circles at frequencies where mean sensitivity was significantly different between 
the two ensembles (small circles for p < 0.05 and large for p < 0.01), and open symbols otherwise, (c, f) Plots of the product 
of the average neural filter and input ensemble amplitude spectra. 



shown in Supplementary Fig. 1, which illustrates the spa- 
tial frequency sensitivities of the two example cells whose 
receptive fields are shown in Fig. 1, and Supplementary 
Fig. 2, which shows scatter-plots of spatial frequency 
sensitivity of noise vs. natural filters across all cells. 



II. OPTIMAL FILTERING IN A NONLINEAR 
SYSTEM. 

In retrospect, such shifts in spatial frequency sensitiv- 
ity may be expected for neural coding to be optimal for 
both of two input ensembles (whit noise and natural stim- 
uli) that have such vastly different power spectra as white 
noise and natural stimuli^i^^i^ (see Fig. 2b, e). In gen- 
eral it is difficult to map optimal coding strategy from one 
ensemble to another; however, it could be done if both of 
the stimulus ensembles were Gaussian so that they were 
entirely characterized by their power spectra. Suppose 
a neuron uses filter La and nonlinearity /a to optimally 
encode Gaussian stimulus ensemble A with spatiotcmpo- 
ral amplitude spectrum PA{k,U!). What would then be 
an optimal strategy to encode Gaussian ensemble B with 
amplitude spectrum PB{k,uj)'? One solution is to leave 
the nonlinearity unchanged and to compensate for dif- 
ferences in the input power spectra by changing neural 
filter properties so that: 



LA{k,uj) ■ PA{k,Lj) = LB{k,u;) ■ PB{k,uj) (1) 

This will leave unchanged all statistics of neuronal re- 
sponse, and so in particular will leave invariant any sta- 
tistical measures of optimality. Alternative strategies in- 
volving a change in nonlinearity cannot be optimal unless 
there are multiple optima, because if ensemble A has a 
unique optimum, then the above strategy will give the 
unique optimum for ensemble B. (Note that, in response 
to an overall change in contrast, the nonlinearity can be 
rescalediSsP, but this is equivalent to a rescaling of the 
filter according to Eq. 1 with no change in nonlinearity.) 

These conclusions about the receptive field and non- 
linearity apply only to Gaussian stimuli. The higher- 
order correlations present in natural scenes may both 
lead to deviations from Eq. (1) in neural filters and cause 
changes in the shape of the nonlinearity. But in practice, 
the changes in the shape of the nonlinearity are small, 
and changes in neural filters that do take place act to 
compensate for changes in the input power spectrum as 
predicted from Eq. (1) (Fig. 2c, f). These changes in 
frequency sensitivity occur primarily at low spatial fre- 
quencies. No changes are observed at mid-to-high spatial 
frequencies, resulting in significant deviations from Eq. 
(1) in the middle range of frequencies. We can only spec- 
ulate that other factors may limit the range of frequencies 
over which adaptation can occur. 
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FIG. 3: Receptive field adaptation increases informa- 
tion transmission. Bars show the mutual information be- 
tween spikes and outputs of either noise (blue, N) or natural 
scenes (red, S) filter applied to natural scenes ensemble (solid) 
or noise ensemble (pixelated). NS, white noise filter applied 
to natural scenes ensemble; SS, natural scenes filter applied 
to natural scenes ensemble; NN, white noise filter applied to 
noise ensemble; SN, natural scenes filter applied to noise en- 
semble. The information values are given in bits (a) or in 
units of the total information carried by the arrival of a single 
spike IspikJ^ (b). 



III. ADAPTATION INCREASES 
INFORMATION TRANSMISSION 



The above optimal coding argument provides at least 
a qualitative explanation of observed receptive field 
changes. Most theories of optimal coding define opti- 
mality in information-theoretic terms. To test directly 
whether the information maximization argument applies 
to our data, we calculated the average mutual informa- 
tion between the filter output and the neural response; 
the response at a given time is simply taken as the pres- 
ence or absence of a single spiked. 

The changes in receptive fields act to increase the in- 
formation after changes in stimulus ensemble, and this 
information would be substantially reduced if receptive 
fields did not change with the ensemble. That is, the nat- 
ural filter carries more information about responses to the 
natural ensemble than to the noise ensemble (p < 10"^, 
paired Wilcoxon two-tailed test) , whereas the noise filter 
carries more information about responses to the noise en- 
semble than to the natural ensemble {p = 0.03). The av- 
erage information values across the population are shown 
in Fig. 3, and scatter-plots on a cell- by-cell basis are pro- 
vided in Supplementary Fig. 4. Each filter produces 
roughly equal information about responses to its own 
ensemble: the difference in information values achieved 
by applying the noise filter to the noise ensemble ver- 
sus applying the natural filter to the natural ensemble is 
not significant {p = 0.18, paired Wilcoxon test). Each 
filter produces substantially less information about re- 
sponses to the other ensemble {p < 10^'* for natural or 
noise ensemble filtered with natural versus noise filter; 
paired Wilcoxon tests), and there is no significant differ- 
ence between the swapped combinations (natural filter 
applied to noise ensemble or visa versa, p — 0.06, paired 
Wilcoxon test). We note that the changes in information 



are not due to overfitting or other computational arti- 
facts, because information was calculated from responses 
to ensemble segments that were not used in calculating 
the filters, and the effects were not seen in data from 
a model linear-nonlinear cell with unchanging filter that 
was analyzed similarly, see Supplementary Information. 

In addition to considering information / in bits (Fig. 
3a), we also measured information for each cell in units 
of /spike?^, the information in the neuron's response (as 
defined above) about the full stimulus (Fig. 3b). ///spike 
measures the fraction of the total possible information 
that is captured by the single most informative filter 
(/spike is a separate measurement that was available only 
for a subset of cells, making the data set smaller). As 
can be seen, the MID captures roughly 35% of the pos- 
sible information for simple cells. Each filter provides a 
greater fraction of the overall information when applied 
to its own ensemble than the other {p < 10"'' for natural 
filter applied to natural vs. noise ensemble and for either 
ensemble filtered with natural vs. noise filter; p = 0.05 
for noise filter applied to natural vs. noise ensemble; 
paired Wilcoxon test). 



IV. DYNAMICS OF RECEPTIVE FIELD 
ADAPTATION 

Even though the best linear-nonlinear model system- 
atically changes with the stimulus ensemble, this does 
not establish that the neuron has changed its encoding 
strategy. The true encoding strategy may be complicated 
and nonlinear, so that even if it is static, the best linear- 
nonlinear estimate of it may change with the ensemble, 
much as the best linear approximation to a curve changes 
with position on the curve. 

The most direct method to distinguish between an 
adaptive strategy and a complex but static coding strat- 
egy would be to estimate the filter as a function of time 
and see it change. This method yields very poor time res- 
olution, because 5 min of data are needed to estimate 
the filter, so adaptation that occurs on a faster timescale 
cannot be seen. Nonetheless we tried this method and 
saw appropriate, if weak, adaptation to noise stimuli even 
on this long time scale (see Supplementary Fig. 5). To 
achieve finer time resolution, we studied adaptation by 
measuring changes with time in the information carried 
by the output of a single, static filter; this information 
can be estimated from ~ 30 s of data. We used the fol- 
lowing reasoning. If the coding is static, then the mutual 
information between this filter's output and the neuron's 
responses to a given ensemble should not systematically 
change in time. However, if the neuron's receptive field 
adapts to the stimulus ensemble, then this information 
may systematically change in time. In particular, we take 
the static filter to be that characterizing a neuron when 
it is well adapted to a given ensemble - say the natural 
ensemble. When the neuron is newly exposed to a natu- 
ral ensemble, the information carried by this filter should 
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FIG. 4: Adaptation dynamics. a,b, The neural filter derived from the last half of natural stimulation is applied to the first 
half of natural (a) or to the noise (b) ensemble. Symbols show information (green, left y axis) and firing rate (blue, right y 
axis) averaged across cells, versus time. The solid line is an exponential fit; dashed lines show one standard deviation based 
on the Jacobian of the fit \p = 0.01 in a and p = 0.003 in b using an F-test with null hypothesis of no time dependence] . The 
taller (shorter) red bars show information for the natural filter applied to natural (noise) inputs (as in Fig. 3, but n = 45). The 
firing rates demonstrate that recordings are stable. 



increase with increasing time of exposure to the natural 
ensemble, as the neuron adapts so that the filter that it 
actually uses to encode incoming stimuli into spikes be- 
comes closer and closer to this static, fully adapted filter. 
Similarly, when the neuron is newly exposed to a noise 
ensemble, the information carried by this filter should 
decrease with increasing time of exposure to the noise 
ensemble, as the neuron's own filter adapts to the noise 
and becomes less and less like the fully adapted natural 
scenes filter. 

We derived filters from the last half of the 10-min pre- 
sentations of each stimulus ensemble, when the neuron 
would be best adapted to the given ensemble if adapta- 
tion occurs. We then applied these static filters to both 
noise and natural stimuli, and measured information be- 
tween spikes and filtered stimuli in successive 34-s periods 
during the first half of stimulus presentation (if the filter 
was derived from the second half of this stimulus) or in 
successive 68-sec periods during all of the presentation of 
the opposite ensemble. Most cells did not show signif- 
icant adaptation when considered individually, presum- 
ably due to the variability in measuring information over 
such brief time periods. However, averaging over the en- 
tire population of simple cells revealed clear adaptation 
over time, consistent with an adaptive coding strategy 
(Fig. 4). The information progressively increased with 
time when natural inputs were filtered with the neural 
filter derived from the natural stimulus ensemble (Fig. 
4a; see also Supplementary Discussion and Supplemen- 
tary Fig. 6), while the information decreased with time 
when that same filter was applied to noise inputs (Fig. 
4b). 

Fits of a single exponential to the average data demon- 
strate that there is a statistically significant monotonic 
change with time, with time constants r = 42 ± 9 s for 
adaptation to the natural ensemble and r = 22 ± 2 min 



for adaptation to the noise ensemble. These time con- 
stants are consistent with the fact that we could not de- 
tect adaptation to the natural ensemble with the 5-min 
time scale of direct filter measurements, but we could 
detect adaptation to the noise ensemble (Supplementary 
Fig. 5). Note, however, that the time constants are based 
on the assumption of exponential decay, and do not ex- 
clude the possibility of multiple time scales, including 
scales faster than we were able to measure, or of alterna- 
tive functional forms of decay. 

We could not detect a significant trend with time in 
the information carried by the noise filter about either 
ensemble (see Supplementary Fig. 7). This is perhaps 
not surprising given that the average decrease in infor- 
mation for the noise filter applied to the noise versus nat- 
ural ensembles was not significant [p = 0.14, unpaired 
i-test), and that the slow time course of adaptation to 
the noise ensemble suggests that the filter we tested was 
not fully adapted to it (see also legend of Supplementary 
Fig. 7). Nonetheless, the presence of significant mono- 
tonic changes in the expected directions for the natural 
scenes filter applied to each ensemble demonstrates that 
the neuron's coding strategy is adapting over time with 
exposure to a given ensemble. 



V. DISCUSSION 

Adaptation is ubiquitous throughout the nervous sys- 
tem, and it occurs in many forms. In vision, adapta- 
tion to luminance mean and variance (contrast) has been 
observed in the retina- i^'^'^'^i^^'^^i-'^^ , lateral geniculate 
nucleus^ and primary visual cortes^i^i^i^, and related 
changes are observed in perceptions^. In the framework 
of our model, adaptation may affect the neural gain (the 
nonlinear input-output function), or the spatiotemporal 
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filter itself. Adaptation of the gain to the mean and vari- 
ance of the stimulus ensemble (and perhaps to higher- 
order statistics^) serves to fit a neuron's dynamic range 
to the dynamic range of the stimulus^i^i^ii^ii^ii^iii^ii^. In 
addition, adaptation of the filter to the mean and vari- 
ance of the stimulus^ii^i^i^ii^'^ has been observed, and it 
has been argued that such adaptation along with adap- 
tation to the stimulus covariance can serve to maximize 
the information per spike in the neuron's response^SilL. 
In general, filter adaptations are nearly instantaneous (< 
0.1 s), while changes in gain can be more gradual (time 
constants up to 10 s, and perhaps longer for some compo- 
nents of adaptation to mean luminance)^'^'^ i^^'^^i^''' . Here 
we find an adaptive change in neural filters in response 
to stimulus statistics beyond the mean and variance, and 
one that occurs over much longer time scales than previ- 
ously found even for contrast gain changes. This suggests 
that the observed adaptation represents a new mecha- 
nism for optimal coding. 

Adaptation to the power spectrum could be considered 
a generalized form of contrast adaptation, in which dif- 
ferent frequency channels providing input to cortical cells 
differentially adapt their gains so that channels with more 
stimulus power show greater adaptation. Indeed, varia- 
tion of gain adaptation across different retinal pathways 
has been observe d^'^'^i^^'^^ . However, these observations, 
and a recently reported pattern-specific component of 
retinal adaptation^, involved adaptation on significantly 
faster time scales than observed here. Also, in the lat- 
eral geniculate nucleus, adaptive changes between white 
noise and natural stimulation were not observed in the 
temporal domain, at least for a majority of cells^^. This 
suggests that the adaptive changes reported here are of 
cortical origin. A pattern-specific component of cortical 
adaptation has been observed: for example, one that dif- 
ferentially affects responses according to the difference of 
the stimulus orientation, direction, or spatial frequency 
from that of the adapting stimulua^'ii'^ii^i^. At least 
in one case, this adaptation has been observed to have 
time constants on the order of a minute or longer^^. It 
is possible that the present observations may share some 
underlying mechanisms with such pattern-specific adap- 
tation. 

Many recent studies have used versions of the lin- 
ear model or related models to estimate receptive fields 
from responses to natural stimuli^ i^^'^'^i^^'^^i^^ . Some 
have reported that the estimates calculated from re- 
sponses to natural stimuli differ from those calculated 
from responses to noiseii^iiS, whereas others^^jSi found 
no change in the major parameters of neural filters, such 
as optimal stimulus orientation and spatial frequency. It 
is not clear from these observations to what degree re- 
ported differences in neural filters are genuinely stimulus- 
induced or are due to biases in the estimation induced by 
the non-Gaussian statistics of natural stimuli together 
with the nonlinearity of the input-output function. The 
fact that the receptive field obtained for a given ensem- 
ble from the linear model best predicted responses to 



other examples of its own ensembleii^iiS^ suggests at least 
partially genuine differences, which is also supported by 
our results on spatial frequency adaptation. However, 
the fact that we found larger differences between filters 
obtained in the linear approximation (dSTA for natu- 
ral stimulus ensemble and STA for white noise ensem- 
ble) than between filters obtained in the linear-nonlinear 
model (MID for natural stimulus ensembles and STA or 
MID for white noise ensemble) suggests that biases also 
exist, and the new information maximization procedure 
used here removes these biases for real neurons, just as 
was demonstrated in numerical simulations- 

We have found that VI neurons adapt their filters to 
stimulus statistics beyond the mean and variance. This 
filter adaptation occurs over 40 s to many minutes, sug- 
gesting it is not a consequence of previously described 
mechanisms of luminance or contrast adaptation. The 
adaptation serves to preserve information transmission 
and to reduce relative responses to stimulus components 
that are relatively more abundant in the stimulus en- 
semble, as predicted by optimal encoding arguments. It 
remains to be determined whether the neurons are adapt- 
ing to changes in power spectra, in higher-order statis- 
tics, or both. The gradual nature of adaptive changes 
and their correspondence to optimization principles sug- 
gests that it might be possible to predict the direction 
and degree of adaptation to stimulus sets with statistics 
intermediate between those of white noise and natural 
stimuli. Thus, there is hope for creating a unified picture 
of neural responses across various input ensembles. 



VI. METHODS 

All experimental recordings were conducted under a 
protocol approved by the University of California, San 
Francisco on Animal Research with procedures previ- 
ously described^. Spike trains were recorded using 
tetrode electrodes from the primary visual cortex of anes- 
thetized adult cats and manually sorted off-line. Visual 
stimulus ensembles of white noise and natural scenes were 
each 546 s long. After manually estimating the size and 
position of the receptive field, neurons were probed with 
full-field moving periodic patterns (gratings). Cells were 
selected as simple if, under stimulation by a moving si- 
nusoidal grating with optimal parameters, the ratio of 
their response modulation (i^i, that is amplitude of the 
Fourier transform of the response at the temporal fre- 
quency of the grating) to the mean response (Fq) was 
larger than one^"'. The rest of the protocol typically con- 
sisted of an interlaced sequence consisting of three dif- 
ferent noise input ensembles of identical statistical prop- 
erties, and three different natural input ensembles. The 
interval between presentations varied in duration as nec- 
essary to provide adequate animal care. All natural input 
ensembles were recorded in a wooded environment with 
a hand-held digital video camera in similar conditions 
on the same day, see Supplementary Movie. The noise 
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ensembles were white overall, but the spatial frequency 
spectrum was divided into eight circular bands, and each 
particular frame was limited to one band at random; this 
white noise design was intended to increase the number 
of elicited spikes. The mean luminance and contrast of 
the noise ensembles were adjusted to match those of the 
natural ensembles. Both noise and natural inputs were 
shown at 128x128 pixel resolution, with angular resolu- 
tion of approximately 0.12° per pixel. To calculate recep- 
tive fields, input ensembles were down-sampled to 32x32 
pixels. The receptive field center was determined from 
the maxima in the STAs for noise and natural ensem- 
bles and was set to the same position for analysis of both 
noise and natural inputs. A patch of 16x16 pixels was se- 
lected around the center (angular resolution of 0.48° per 
pixel) to make analysis computationally feasible and to 
minimize effects due to undersampling (we strove to have 
the number of spikes greater than the dimensionality of 
the receptive fields'^^). In all cases subsequent analysis 
of receptive fields verified that the selected patch fully 
contained the receptive field. These receptive fields were 
used in all quantitative analyses. Figs 2-4. Examples in 
Fig. 1 were computed at and are shown at twice the 
angular resolution to illustrate the finer structure of the 
receptive field, as well as differences in performance of 



the various methods. 



VII. ACKNOWLEDGMENTS 

We acknowledge suggestions from W. Bialek on the 
design of experiments and subsequent data analysis. We 
thank M. Caywood, B. St. Amant , and K. MacLeod 
for help with experiments. We thank P. Sabes, M. 
Kvale and S. Palmer for many helpful suggestions on 
statistical aspects of data analysis. Computing resources 
were provided by the National Science Foundation under 
the following NSF programs: Partnerships for Advanced 
Computational Infrastructure at the San Diego Super- 
computer Center through NSF cooperative agreement 
ACI-9619020, Distributed Terascale Facility (DTF) and 
Terascale Extensions: enhancements to the Extensible 
Terascale Facility. This research was supported through 
grant R01-EY13595 to K.M. from the National Eye In- 
stitute and by a grant from the Swartz Foundation and 
a career development award K25MH068904-02 from the 
National Institutes of Mental Health to T.S. 

Correspondence and Requests for materials should be 
addressed to: sharpee@phy.ucsf.edu 



^ F. E. Theunissen, K. Sen, and A. J. Doupe, J. Neurosci. 

20, 2315 (2000). 
^ M. P. Sceniak, M. J. Hawken, and R. Shapley, J. Neuro- 

physiol. 88, 1363 (2002). 
^ M. J. Nolt, R. D. Kumbhani, and L. A. Palmer, J. Neuro- 

physiol. 92, 1708 (2004). 

L. Maffei, A. Fiorentini, and S. Bisti, Science 182, 1036 
(1973). 

^ R. Shapley and J. Victor, Vision Res. 19, 431 (1979). 
® R. M. Shapley and C. Enroth-Cugell, Progress in Retinal 
Research 3, 264 (1984). 

I. Ohzawa, G. Sclar, and R. D. Freeman, J. Neurophysiol. 
54, 651 (1985). 

® A. B. Saul and M. S. Cynader, Vis. Neurosci. 2, 593 (1989). 
^ S. M. Smirnakis, M. J. Berry, D. K. Warland, W. Bialek, 

and M. Meister, Nature 386, 69 (1997). 
^° N. Brenner, W. Bialek, and R. R. de Ruyter van 

Steveninck, Neuron 26, 695 (2000). 
" V. Dragoi, J. Sharma, and M. Sur, Neuron 27, 287 (2000). 
^2 A. L. Fairhall, G. D. Lewen, W. Bialek, and R. R. 

de Ruyter van Steveninck, Nature pp. 787-792 (2001). 
" D. Chander and E. J. Chichilnisky, J. Neurosci. 21, 9904 

(2001). 

^'^ S. A. Baccus and M. Meister, Neuron 36, 909 (2002). 

A. Kohn and J. A. Movshon, Nat. Neurosci. 7, 764 (2004). 

S. G. Solomon, J. W. Peirce, N. T. Dhruv, and P. Lennie, 

Neuron 42, 155 (2004). 
1^ J. D. Victor, J. Physiol. 386, 219 (1987). 

S. P. Brown and R. H. Masland, Nat. Neurosci. 4, 44 

(2001). 

J. A. Movshon and P. Lennie, Nature 278, 850 (1979). 
^° D. G. Albrecht, S. B. Farrar, and D. B. Hamilton, J. Phys- 



iol. 347, 713 (1984). 

S. V. David, W. E. Vinje, and J. L. Gallant, J. Neurosci. 
24, 6991 (2004). 

R. Baddeley, L. F. Abbott, M. C. A. Booth, F. Sengpiel, 
T. Freeman, E. A. Wakeman, and E. T. Rolls, Proc. R. 
Soc. Lond. B 264, 1775 (1997). 

F. Theunissen, S. David, N. Singh, A. Hsu, W. Vinje, and 
J. Gallant, Network 3, 289 (2001). 

D. L. Ringach, M. J. Hawken, and R. Shapley, Journal of 
Vision 2, 12 (2002). 

D. Smyth, B. Willmore, G. E. Baker, I. D. Thompson, and 

D. J. Tolhurst, J. Neurosci. 23, 4746 (2003). 

'^^ G. Felsen, J. Touryan, F. Han, and Y. Dan, PLoS Biol. 3, 
1819 (2005). 

E. de Boer and P. Kuyper, IEEE Trans. Biomed. Eng. 15, 
169 (1968). 

^® F. Rieke, D. Warland, R. R. de Ruyter van Steveninck, and 
W. Bialek, Spikes: Exploring the neural code (MIT Press, 
Cambridge, 1997). 

N. C. Rust, O. Schwartz, J. A. Movshon, and E. P. Simon- 
ceUi, Neuron 46, 945 (2005). 

T. Sharpee, N. Rust, and W. Bialek, Neural Computation 
16, 223 (2004), see also physics/0212110, and a prelimi- 
nary account in Advances in Neural Information Process- 
ing 15 edited by S. Becker, S. Thrun, and K. Obermayer, 
pp. 261-268 (MIT Press, Cambridge, 2003). 

D. L. Ruderman and W. Bialek, Phys. Rev. Lett. 73, 814 
(1994). 

^2 D. Field, Neural Comp. 6, 559 (1994). 

E. P. Simoncelli and B. A. Olshausen, Annu. Rev. Neu- 
rosci. 24, 1193 (2001). 

B. Skottun, R. De Valois, D. Grosof, J. Movshon, D. Al- 



9 



brecht, and A. Bonds, Vision Res. 31, 1079 (1991). 

D. J. Field, J. Opt. See. Am. A 4, 2379 (1987). 

D. W. Dong and J. J. Atick, Network: Comput. Neural 

Syst. 6, 345 (1995). 

N. Brenner, S. P. Strong, R. Koberle, W. Bialek, and R. R. 
de Ruyter van Steveninck, Neural Computation 12, 1531 
(2000), see also pliysics/9902067 

C. Blakemore and^FT^; Campbell, J. Physiol. 200, 11 
(1969). 

M. N. Kvale and C. E. Schreiner, J. Neurophysiol. 91, 604 

(2004) . 

M. J. Wainwright, Vision Res 39, 3960 (1999). 

J. J. Atick and A. N. Redlich, Neural Comput. 4, 196 

(1992). 

T. Hosoya, S. A. Bacons, and M. Meister, Nature 436, 71 

(2005) . 

Y. Dan, J. J. Atick, and R. C. Reid, J. Neurosci 16, 3351 
(1996). 

A. A. Emondi, S. P. Rebrik, A. V. Kurgansky, and K. D. 

Miller, J. Neurosci. Methods 135, 95 (2004). 

H. Barlow, in Sensory Communication, edited by 

W. Rosenblith (MIT Press, Cambridge, 1961), pp. 217- 

234. 

H. Barlow, Network: Comput. Neural Syst. 12, 241 (2001). 
J. J. Atick and A. N. Redlich, Neural Comput. 2, 308 
(1990). 

H. Barlow, in Vision: Coding and Ejficiency, edited by 

C. Blakemore (Cambridge University Press, Cambridge, 
UK, 1990), pp. 363-375. 

H. Barlow and P. Foldiak, in The computing neuron, edited 
by R. Durbin, C. Miall, and G. Mitchinson (Addison- 
Wesley, New York, 1989), pp. 54-72. 

D. M. Coppola, H. R. Purves, A. N. McCoy, and D. Purves, 
PNAS 95, 4002 (1998). 

S. P. Strong, R. Koberle, R. R. de Ruyter van Steveninck, 
and W. Bialek, Phys. Rev. Lett. 80, 197 (1998). 



VIII. SUPPLEMENTARY IINFORMATION. 



A. Supplementary Discussion. 

Optimal filtering in a nonlinear system. The sim- 
ple argument leading to Eq. (1) may appear reminiscent 
of the redundancy reduction principle^!^'^'^'^. How- 
ever we do not assume that the response is linear, impose 
a particular constraint, or specify the optimality mea- 
sure. We simply assume that the optimality measure, 
whatever it may be, is preserved under a change in en- 
semble. Due to the generality of this argument, we can- 
not make predictions for the optimal shape of frequency 
tuning, only for the relative changes in tuning upon a 
change in the input power spectra. For a linear system, 
the redundancy reduction arguments predic t ^^'"^^1^^ that 
neural filters should completely remove second-order cor- 
relations present in the input ensembles, i.e. the product 
L{k)P{k) should be constant across frequencies for suf- 
ficiently small frequencies for any ensemble. Although 
this argument may reasonably describe subcortical vi- 
sual processing^ii^i^, it does not appear to describe vi- 
sual cortex either in response to natural stimuli or to 



noise (Fig. 2c, f), where L{k)P{k) depends on k. There- 
fore nonlinearities of simple cells and/or alternative opti- 
mization principles appear essential in describing optimal 
filter properties in the primary visual cortex. 

In the Discussion of the main text, we point out 
that the adaptation observed here may share some un- 
derlying mechanisms with previous observations of cor- 
tical pattern-specific adaptation. Indeed, it has been 
proposed^SiiSiiiS that such pattern-specific adaptation 
arises from anti-Hebbian or decorrelating mechanisms 
that would more generally lead to adaptation to the stim- 
ulus power spectrum like that observed here. These mod- 
els of adaptation^i^i^ are closely related to the redun- 
dancy reduction arguments just discussed and, more gen- 
erally, to principles of optimal encoding^i^i^i^i^ that 
have been proposed to govern the design and operation of 
the nervous system. Despite the specific disagreements 
just discussed, our results support these general ideas in 
two respects. First, we have found that adaptation acts 
to reduce relative responsiveness to patterns that have 
relatively greater stimulus power, as these theories pre- 
dict. Second, we have found that neural filters adapt to 
changes in stimulus ensemble in a manner that increases 
the information transmitted, relative to the information 
that would be transmitted if filters did not adapt (as seen 
by the decreased information when filter and ensemble 
are swapped. Fig. 3). 

The optimality argument (1) for a nonlinear system 
analyzing Gaussian inputs predicts that the nonlinearity 
does not change its functional form. This is supported 
in our data by the fact that the average information val- 
ues are roughly equal under natural and noise stimula- 
tion. The information / can be rewritten in terms of 
the nonlinear function f{x) of the filter output x and 
the probability P{x) that the filter output has value x: 
I = J dxP{x)f{x)\og2f{x). One way to preserve this 
sum is to use the strategy of our optimality argument: 
to leave P{x) unchanged, which for a Gaussian ensem- 
ble is accomplished by changing the filter according to 
Eq. 1, and to leave the nonlinearity f{x) unchanged. 
The extent to which this strategy is followed by our two 
example cells can be seen in Fig. 1 and Supplementary 
Fig. 3: the pink curves illustrate P{x), while the blue 
curves illustrate f{x), which in Fig. 1 is also scaled by 
the firing rate. As can be seen by comparing the curves 
for the noise MID to that for the natural MID, the curves 
are at least roughly preserved. 

Sensitivity of simple cells to multiple stimulus 
dimensions. We find that even simple cells in primary 
visual cortex are sensitive to more than one stimulus di- 
mension, in agreement with other recent work2Si2^. A 
single filter corresponds to a single stimulus dimension; 
the filter output tells the strength of the stimulus along 
that dimension. The ratio ///spike (Fig. 3b) tells the pro- 
portion of the information encoded about the stimulus in 
the neuron's spikes that can be accounted for by the out- 
put of the most informative filter'^'^. For simple cells, the 
dominant filter accounts for only about 35% of the over- 
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SUPPLEMENTARY FIG. 1: This figure sliows tiie spatial frequency profiles of receptive fields from the two example cells of 
Fig. 1. Spatial frequency sensitivity at zero temporal frequency (a,c) and at 10 Hz (b,d). Red indicates filter derived from 
responses to noise ensemble, blue indicates filter derived from responses to natural ensemble. The second of the two example 
cells is typical in all respects. The first of the two cells is atypical in that it did not change its sensitivity at low spatial 
frequency between natural and noise stimulation at Hz, but exhibited an appropriate change in its tuning at 10 Hz, see 
Supplementary Fig. 2. 



all information. Thus, other stimulus dimensions must 
significantly influence the neuron's firingiO'26,29 pj,g_ 
sumably all of these relevant dimensions also shift with 
changes in stimulus ensemble, of which we analyzed here 
only the dominant one. It is also possible that adaptive 
changes in the structure of each of the relevant dimen- 
sions will change their relative importance for eliciting 
a spike. In particular, the dominant filter for one input 
ensemble might become secondary in encoding the other 
input ensemble. The fact that we did not see qualitative 
changes in the structure of the dominant filter between 
natural and noise stimulation suggests that such shifts 
in the relative role of dimensions are not common. Fu- 
ture studies will extend the adaptation analysis to include 
other relevant dimensions beyond the dominant filter. 

Optimal spatial frequency and orientation un- 



der natural and noise stimulation. Filters derived 
from noise and natural stimuli had similar optimal orien- 
tation and spatial frequency. The optimal values were ob- 
tained as the position of the maximum of the 2D Fourier 
transform in space at the temporal frequency of the grat- 
ing (2Hz). We found a small but statistically significant 
shift in the optimal spatial frequency, with filters derived 
from noise inputs having a 21% (± 3% s.d.) higher value 
of the optimal spatial frequency than filters derived from 
natural inputs {p < 10~^). This shift in optimal spatial 
frequency was small enough that neither the noise en- 
semble estimate nor the natural ensemble estimate was 
significantly different from direct measurements of the 
preferred spatial frequencies of these cells with gratings. 
We note that the measurements with gratings were done 
separately, before exposure to the noise or natural en- 
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sembles, and do not represent tests of grating spatial fre- 
quency sensitivity in the states of adaptation to white 
noise or natural stimuli. We note also that our con- 
clusions about optimal coding depend on the sensitivity 
throughout the entire range of spatial frequencies and 
not on the position of the maximum of the spatial fre- 
quency tuning curve (the "optimal" spatial frequency) 
for a particular cell. 

In agreement with previous finding o^^i^^ , we did not 
see statistically significant changes in optimal stimulus 
orientation between grating, natural ensemble, or noise 
ensemble estimates. Natural stimuli have anisotropic 
power spectra with increased power at horizontal and 
vertical orientations^, and therefore one might have ex- 
pected some shifts in optimal stimulus orientation away 
from horizontal or vertical for the natural filter relative 
to the noise filter. Adaptation to orientation is strongest 
when the difference between the preferred orientation of 
the neuron and the adapting orientation is between 20- 
60 degrees, and acts to shift the preferred orientation 
away from the adapting stimulusii. Thus, shifts due 
to over-representation of vertical and horizontal orien- 
tations would both tend to occur on neurons preferring 
oblique orientations, and would be in opposite directions. 
We speculate that the two effects tend to cancel. 

Dynamics of Adaptation to Natural Stimuli. 

Here we argue against certain artifactual explanations 
of Fig. 4a. It could be argued that the increase of infor- 
mation with time seen in Fig. 4a may occur because of 
correlations between the stimuli used for the information 
calculation (the "test set") and those used in calculat- 
ing the filters themselves (the "training set"). Natural 
movies tend to have correlations that diminish in time as 
a power law rather than an exponential'^^i^^i^, and in 
that sense are long-lasting. The training set was the last 
half of the movies, so it might be argued that, as time pro- 
gresses from the beginning of the movies, the correlation 
of the test set with the training set would increase and 
this might explain the increase in information. One ar- 
gument against this explanation is that information sat- 
urates after the first quarter of stimulus presentations, 
whereas the correlation with the training set would con- 
tinue to increase throughout the first half. We tested this 
explanation more directly by using an alternative train- 
ing set. We calculated the filters from the middle half of 
the movies (136 to 410 sec) and then calculated informa- 
tion on the first quarter and the last quarter. Now the 
first quarter and the last quarter are equally distant in 
time from the training set, and so if this explanation were 
correct we would expect them to be mirror images of each 
other: information would go up during the first quarter 
and go down by an equal amount during the last quar- 
ter. On the contrary, and in support of the adaptation 
argument, we see the same rise in information during 
the first quarter as before, even though the first quar- 
ter is now much closer in time to the training set, and 
we see no fall in information during the last quarter, cf. 
Supplementary Fig. 6a. An exponential fit gave a time 



constant of 55 ± 9 s, which agrees with the time constant 
of 42 ± 9 s derived from information during the first half 
of the data, cf. Fig. 4. Also, against the more general ar- 
gument that the rise or fall in information in Fig. 4 might 
be due to some non-stationarity in the stimulus movies, 
we show that relevant stimulus components, such as the 
mean and the standard deviation of the outputs of the 
neural filters applied to these movies, are stable, cf. Sup- 
plementary Fig. 6 (b-e). 



B. Supplementary Methods. 

Dataset Selection. The present dataset is obtained 
from 4 animals and included 133 single units which were 
clustered using a manual spike sorter. For 85 of the 133 
neurons, a reliable non-zero filter was obtained from nat- 
ural inputs, as judged by visual inspection. We found 
that this subjective criterion correlated well with an ob- 
jective criterion of having a significantly positive infor- 
mation value for the filter applied to its own ensemble 
(after finite-size corrections^ are applied). The informa- 
tion was positive for all 85 cells, and exceeded its stan- 
dard deviation in 81/85 cells. We used the latter criterion 
to select the dataset of 71 cells with reliable filter esti- 
mates to both noise and natural stimuli, of which 40 were 
classified as simple based on their responses to moving 
sinusoidal gratings of optimal orientation and spatial fre- 
quency. Specifically, simple cells were those with ratio of 
Fi/Fq > 1, where Fi is the response modulation (Fourier 
component at the frequency of the stimulus grating) and 
Fq is the mean response to the optimal grating. Because 
results of Fig. 4 are based only on natural stimuli filters, 
we have included 5 additional simple cells for which the 
natural stimulus filter was reliable and noise stimulus fil- 
ter was not. 

Response Reconstruction: Neural Filters and 
Corresponding Nonlinearities. In the framework of 
the LN model, the probability of response to a particular 
input S is given by an arbitrary nonlinear function / 
which only depends on the product of the input signal S 
and the neural filter L: 

/-/(L*S). (2) 

More generally, reconstruction might require description 
in terms of a nonlinear function of the outputs of several 
filters, or curved subspaces instead of a strictly linear 
projection between signals and filters. However, in this 
paper we focus on the analysis of properties of the dom- 
inant filter L of the LN model obtained with noise or 
natural inputs. We note that the assumption of a single 
linear filter is more general than the assumption that the 
cell is linear overall, because the input/output function 
can be strongly nonlinear and is usually well described 
by a threshold or threshold-linear function. 

In the case of white noise inputs, the linear filter can be 
found using the reverse correlation method, also known 
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as the spike-triggered average (STA): 

esTA = (SP (spikc|S)) - P (spike) (S), (3) 

where the expectations are taken over the stimulus en- 
semble probability distribution -P(S). In other words, the 
STA vector is computed by taking the average stimulus 
weighted by the number of spikes it elicits and subtract- 
ing the average stimulus multiplied by the overall number 
of spikes. The magnitude of the filter is irrelevant, be- 
cause its change can be accommodated by an appropriate 
rescaling of the input-output function (2), which converts 
stimulus components along the relevant filter into spike 
probability. Therefore, we normalize all of the derived 
filters to unit length or measure them with respect to 
the noise level. 

If inputs are taken from a Gaussian distribution with 
correlations (colored noise), then the linear filter can be 
estimated by computing the STA according to Eq. (3) 
with a subsequent correction for input correlations. The 
decorrelated STA (dSTA) is obtained by multiplying the 
STA with the inverse of the stimulus covariance matrix 

SdSTA = C'~^esTA (4) 

In the case of correlated Gaussian inputs, the dSTA fil- 
ter Eq. (4) represents the solution of both the purely 
linear model and the LN model. This is no longer true 
for natural inputs, which are not Gaussian^. Therefore 
we calculate and treat the dSTA for the natural ensem- 
ble as the prediction of the purely linear model. It is 
known that higher signal-to-noise ratios and smoother 
filters can be achieved by various forms of regulariza- 
tion of the decorrelation process, including low-pass fil- 
tering the STA or imposing a high-frequency cutoff on 
the covariance matrix^i^i^. The increase in predictive 
power upon such regularization happens for three rea- 
sons. First, due to finite data or simply the nature of 
the stimulus ensemble, the covariance matrix might be 
singular or nearly so, so that its inversion would result 
in uncontrollably large eigenvalues for high frequencies 
where power in the stimulus ensemble is small. We have 
found that this is not the case for our covariance matrix: 
calculation of the dSTA according to Eq. (4) without any 
regularization, in numerical simulations for model linear 
cells, led to excellent agreement between the dSTA and 
the filter of the model cell with correlation coefficients 
> 0.99^° (and unpublished data). Second, due to fi- 
nite amounts of data, there is noise in the estimation 
of the STA. If this noise has a relatively flat spectrum, 
then at high frequencies where signal in the true STA is 
low, decorrelation may preferentially amplify noise rather 
than signal. Again, our results with the linear model 
with a finite number of spikes (e.g. 1000 spikes) suggest 
that this is not a problem, although we cannot be cer- 
tain that the noise problem is not worse for real nonlinear 
neurons. Third, because the dSTA is a biased estimate 
of the filter of an LN neuron probed with natural scenes. 



the estimate might be improved by deviating from the 
linear model. This can be done by adding a parameter 
(a low-pass cutoff) and tuning this parameter on a cell- 
by-cell basis to maximize predictive power of the result- 
ing filter— However, it is not clear to what degree a 
change in just one parameter could account for all devia- 
tions between filters of the fully linear model and those of 
the LN framework. For all of these reasons, we refrained 
from regularization in our calculations of the dSTA ex- 
cept in the illustrations of example cells in Fig. 1; we oth- 
erwise treated the dSTA calculated by Eq. (4) as the pre- 
diction of the fully linear model. It should also be noted 
that the inclusion of an ad-hoc low-pass filter parameter 
would make it impossible to reliably estimate the higher- 
frequency parts of the filter; this, along with the bias 
of the unregularized dSTA, is why the MID method was 
necessary for us to assay changes in the spatial frequency 
tuning across ensembles. In Fig. 1, for comparison pur- 
poses, we illustrate both regularized and unregularized 
forms of the dSTA. Regularization was based on selecting 
a cutoff on the eigenvalues of the covariance matrix C be- 
low which none of the eigenvalues with the corresponding 
eigenvectors contributed to the inverse in Eq. (4), 
making it a pseudo-inverse^^i^. For each possible value 
of the cutoff parameter, the dSTA vector was calculated 
according to Eq. (4) based on a trial set using 7/8 of the 
data. The optimal cutoff value was selected as that for 
which the corresponding dSTA provided maximal infor- 
mation on the remaining 1/8 of the data designated as a 
test set. 

In addition to the above methods, we also derived 
neural filters using the method of most informative 
dimensions'^*' , see next section. For all of the above meth- 
ods, jackknife analysis of neural filters was performed: 
8 filters were computed, each with 1/8 of the data left 
out. When information was computed for a filter on its 
own ensemble, it was calculated only on this 1/8 of the 
data that was not used for computing the filter, except 
in Fig. 4 where a single filter was calculated from 1/2 
of the data and information was calculated on segments 
of the other half. In all other cases, information values 
reported are an average over the 8 values found with the 
8 jackknife estimates. To establish statistical significance 
of the difference between filters derived with any two dif- 
ferent methods and/or two stimulus ensembles, all 16 of 
the corresponding jackknife estimates (8 for each combi- 
nation of method and ensemble) were projected on the 
direction of the difference between the mean filters de- 
scribing the two groups, and an unpaired Students t-iest 
was used on these projections. To calculate the signal-to- 
noise level of receptive fields shown in Fig. 1, we compute 
the average standard deviation across all components of 
the receptive field across all the jackknife estimates (nor- 
malized to unit length) and display receptive field values 
relative to that noise level. 

Once the filter L has been obtained as either the STA 
(3), dSTA (4), or the MID^o, we can calculate the non- 
linear input-output function (2) directly from the data. 
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SUPPLEMENTARY FIG. 2: Spatial frequency sensitivity on a cell-by-cell basis for the first 9 spatial frequencies from Fig. 2 
(here called kl to k8 from lowest to highest) for temporal frequencies of and 10 Hz respectively. P-values on top of each 
graph show significance in sensitivity differences of filters derived from noise vs. natural stimulation. Color for each cell 
codes sensitivity to noise filter at lowest frequency (kl) and is retained in the plots of higher frequencies. The two example 
cells of Supplementary Fig. 1 are marked as a and an 'X' respectively. Note that the cell marked by a is atypical in 
its behavior at OHz. 
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According to its definition it is given by the normalized 
spike probability given the stimulus S: 



/(S*L) = 



P(spike|S) 
P (spike) 



When working in the framework of the linear-nonlinear 
model we assume that the spike probability only depends 
on stimulus components along the filter L of interest: 
P(spike|S) — P(spike|S * L). Therefore the nonlinear 
input /output function can also be written as: 



/(S*L) 



P(spike|S *L) 



P (spike) 

The last expression can be transformed using Bayes' rule 

P(S *L|spike) 



/(S*L) = 



P(S *L) 



(5) 



That is, the nonlinear input/output function / is eval- 
uated as a ratio of probability distributions of stim- 
ulus components along the filter L, P(S*L), and 
of the probability distribution of stimulus components 
P(S * L I spike) conditional on a spike. Both of the prob- 
ability distributions are readily available from the exper- 
imental data. 

Reconstruction of Receptive Fields as Most 
Informative Dimensions. The justification for the 
method of most informative dimensions as a way to cal- 
culate neural receptive fields is described elsewhere'^^ , 
where performance of the method is illustrated on model 
visual and auditory neurons. For the convenience of the 
reader we describe here the methodology of maximizing 
information to find the receptive fields. It was shown that 
the information between the output of a particular vector 
L in the input space and the neuron's response, regarded 
as a spike or no spike in each time bin, can be computed, 
to lowest order in the probability P(spike) of a spike in 
the time bin, as the KuUback-Lcibler distance between 
the probability distributions P{x) and P(x|spike): 



/(L) 



dxPi, {x I spike)log2 



Pl (a; I spike) 



(6) 



where Pl{x) is the probability distribution of stimulus 
projections x onto the vector L in the input ensemble, 
and Pl (a: I spike) is the probability distribution of stim- 
ulus projections x onto the vector L among inputs that 
led to a spike. We compute these two probability dis- 
tributions as histograms in 21 bins covering the range of 
projection values (the same number of bins was used in 
finding MIDs from neural responses to noise and natural 
ensemble). For each trial vector, we also compute the 
gradient of information as: 



Vl/ = / dxPL(x) [(sjx, spike) 



Tx 



PL(x|spike) 



^l(x) 



(7) 



where (S|a;) is the average of the stimuli having projec- 
tion value of X onto the vector L (using the same bin- 
ning of X as for the probability distributions Pl (x) and 
Pl (2^ I spike)). Similarly, (Six, spike) is the average of the 
stimuli that led to a spike that had projection value of x 
onto the vector L. We evaluate the derivative at a par- 
ticular value of x using Savitsky-Golay coefhcients (W.H. 
Press et al.. Numerical Recipes, Cambridge University 
Press 1998) based on two adjacent bins on either side of 
the bin with the value x; if projections values from any 
one of these bins were not encountered in the stimulus 
ensemble, the corresponding average did not contribute 
to the derivative. We find that the use of Savitsky-Golay 
smoothing coefficients is not required, but helps improve 
convergence of the algorithm [note that in the search al- 
gorithm, described below, the trial vectors are accepted 
based on information values, which are evaluated without 
smoothing] . This analysis requires that stimuli and spike 
trains are binned at the same time resolution (33 ms for 
natural stimuli and 16 ms for noise stimuli). Therefore 
occasional stimuli correspond to multiple spikes in a bin. 
If that happened, projections values of such stimuli were 
counted as many times as there were spikes for all the 
probability distributions and averages in Eqs. (6) and 
(7). 

The search for the most informative dimension (MID) 
is initialized by setting the starting vector equal to the 
STA. To generate a new trial vector, we perform a line 
maximization (W.H. Press et al.. Numerical Recipes, 
Cambridge University Press 1998) along the line defined 
by the gradient (7), and choose, on average, the one with 
the largest information. Because information (6) as a 
function of components of the vector L has local maxima, 
smaller information values are accepted with Boltzmann 
probability, exp(— AJ/T), where Al is the decrease in in- 
formation between the new and old trial vector measured 
in units of the information /gpikc carried by the arrival of 
a single spike, and the parameter T is called the effective 
temperature of the simulated annealing cooling scheme. 
Information values in these units are typically less than 
one (unless there is overfitting) . Therefore, we start the 
simulated annealing scheme with T = 1, and decrease it 
by a factor of 0.95 after each line maximization. If the 
search appears to have converged with a fraction preci- 
sion of 5 X 10~^ and the effective temperature T < 10~^, 
then the effective temperature is increased by a factor 
of 5, but not to exceed the starting temperature value. 
This results in repeated "cooling" and "remelting", and 
is equivalent to restarting the algorithm multiple times. 
We limit the total number of line maximizations to 3000. 
The best vector found in terms of information during the 
overall maximization procedure is taken as the most in- 
formative dimension L. Cross-validation is performed by 
leaving out 1/8 of data and treating that 1/8 as a test 
set. We compute information on the test set after every 
100 line maximizations, and if the information value has 
dropped on the test set by 25% of its maximum value, 
the optimization procedure is stopped and the current 
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P(X) P(X) 




filter output in units filter output in units 

of its standard deviation of its standard deviation 

SUPPLEMENTARY FIG. 3; Panels (a,b) show that the nonhnear input/output function f{x) = P(spike[a;)/P(spike) 
associated with the MID filters for two exemplary cells of Fig. 1 overlap under natural (solid) and noise (dashed) stimulation 
when stimulus projection x along the corresponding receptive fields is measured in units of its standard deviation (x-axis). 
For comparison, in Fig. 1 we plot the input/output function f{x) scaled by the firing rate, P(spike|a;) - the probability of a 
spike in 33ms window given a stimulus projection value x along the receptive field. Therefore the difference in scale for the 
nonlinearities observed between natural and noise conditions in Fig. 1, as for example cell 856 2, refiects only a change in the 
mean firing rate under the two conditions. Panels (c,d) show the probability distributions of projections x for natural (solid) 
and noise (dashed) stimulation. 



filter taken as the MID. Such early stopping seldom oc- 
curs when we compute receptive fields from responses to 
natural scenes, but is common when receptive fields are 
computed from noise ensembles. This is due to the fact 
that the starting point, the STA, is very close to the op- 
timal value when neural responses to the noise ensemble 
are analyzed. 

Because the MID method is based on a search in 
a high-dimensional space for an information maximum, 
there is of course a concern that our search might become 



stuck in a local maximum. We believe this is not a con- 
cern for the following reasons. First, as just noted, our 
search procedure is equivalent to restarting the search 
algorithm multiple times from multiple starting points, 
only the first of which is the STA, and we take the max- 
imum of information over the entire search. Second, in 
studies of model cells^(and unpublished data), we have 
found that the error (measured as 1 minus the projec- 
tion between the true model filter and the MID found by 
the search) decreases as 1/iV where TV is the number of 
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SUPPLEMENTARY FIG. 4: The increase in information on a cell-by-cell basis when the noise filter is applied to the noise 
vs. natural ensemble (a) or when the natural filter is applied to the natural vs. noise ensemble (b). Panels (c,d) show this 
effect in units of /spike. Notations are as in Fig. 3. 



spikes used to estimate the filter. This is the dependence 
predicted theoretically^^, and would not be expected to 
hold if the true maximum were not being found. Third, 
we have previously verified on model cells that begin- 
ning with a random starting point rather than the STA 
does not produce better solutions. The STA represents 
a natural choice of a starting point in that it is clearly a 
stimulus direction that carries nonzero information about 
the neuron's response. 

The MID method produces an unbiased esti- 
mate. In this section we provide a detailed derivation for 
the fact, first published in Refi^, that the MID method 
produces unbiased estimates of neural filters within a 
single-filter LN model. We will first consider the case 
of infinite data, and then go through details of the argu- 
ment with finite data. 

While the MID filter can be calculated with respect 
to any particular pattern of spikes^, in this paper we 
have concentrated on finding filters associated with sin- 
gle spikes. Therefore we will do so in this section as 
well. Information carried by individual spikes about the 



incoming stimuli is given by^: 



^spikc 



P(spike|S), P(spike|S) 



P (spike) 



-log. 



P (spike) 



(8) 



Because this is the information between single spikes 
and full, unfiltered, stimuli, information between spikes 
and stimuli filtered along any dimension may not ex- 
ceed (8). To verify that the only filter that leads to an 
equal amount of information between spikes and stim- 
uli filtered with it is the neural receptive field L, we in- 
voke the main assumption of the single-filter LN model: 
P(spike|S) = P(spike|S * L), so that: 



spike 



^ /rf-SP(S)^i^g;il)log/(^P^'^^l^*^) 



P(spike) 



P(spike) 



The integration d^S with along all stimulus dimensions 
can be carried out separately along the relevant stimulus 
dimension, S*L, and along the rest of stimulus dimen- 
sions, which we denote as Sj^: 
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^ spike 



^^ P(spike|S>.L) ^ P(spikc|S * L) 



P(spike) 



P(spike) 



Integration with respect to all of the irrelevant stimulus 
dimensions Sj^ results in: 

f ^, P(spike|S.L) ^ P(spike|S*L) 
ispikc = /d(S*L) „^ log2- 



P(spike) 



P(spike) 



xP(S *L), 



which is precisely the information along the filter L, 
cf. Eq. (6). We have thus shown that information 
along the filter that represents the neural receptive field 
achieves the maximal information possible, /spike and de- 
scribes the encoding S ^ S * L ^ spikes. Filtering 
along any other dimension V will correspond to encod- 
ing S^S*V^S*L^ spikes or S — > S * L 
S * V spikes and, by the data processing inequality 
(Cover and Thomas, John Wiley Inc. 1991), leads to a 
lower information processing value. The data process- 
ing inequality applies to stochastic inputs but presumes 
that we know exact probabilities such as P(S * L| spike) 
and P(S * Vjspike). This shows that the MID method 
is unbiased in the limit of infinite data and stochastic 
neurons. 

With finite data, we have only a limited number 
of samples to measure the probability distributions 
P(S * Ljspike) and P(S|spike). With N spikes, our 
empirical estimates of these probability distributions 
Piv(S * Ljspike) and Pjv(S|spike) will differ from experi- 
ment to experiment in such a way that the average across 
trials produces the true distribution and the variance 
across trials acquires a term of the order of 1/N: 



(Pjv(S|spike)) 
(P^(S|spike)2) 



P(S|spike) 



1 



P(S|spike)2 + ^P(S|spike) 
X (1 - P(S|spike)) 



(9) 



(10) 



where we have used the properties of the binomial distri- 
bution; each particular stimulus S can occur with a spike 
anywhere between and N times, if N is the total num- 
ber of spikes. Similar relations can be used with other 
probability distributions involved. 

The deviation between the true filter and the MID fil- 
ter obtained with a particular data set, (5V, is propor- 
tional to the gradient of information (evaluated with fi- 
nite data) at the position of the true filter: 5V ~ V/(L). 
Here we show that, as was stated in Refi^, the gradi- 
ent of information is zero, after averaging across trials, 
for the true filter. To verify this we represent informa- 
tion /7v(L) = /(L) + (5/Ar(L), as the information obtained 
with infinite data and the deviation from it due to finite 
sampling. The gradient of the information is zero at the 



true filter L. The deviation 

(5/Af(L) — I (ia;(5PAr(a;|spike)log2 

c?x5PAr(a::|spike), 



P(a;|spike) 



P{x) 



(11) 



where x = S * L, (5P/v(a;|spike) = P7v(2^|spike) — 
P (a; I spike) is the difference between the empirical and 
true distributions, and there is no need to consider 
noise in the stimulus distribution P{x) because it might 
be taken as the one actually used in the experiment. 
Next we take into account that the empirical dis- 
tribution obeys a normalization constraint, such that 
/ da;PAr(a;|spike) = 1, and therefore / da;(5PAr(a;| spike) = 
0, so that: 



(5/Ar(L) = / fia;(5PAr(a;|spike)log2 



P(a::|spike) 



P{x) 



(12) 



But the average of the empirical distributions is the true 
distribution (9), so (5/Ar(L) = in the first-order ap- 
proximation in the deviations between empirical and true 
distributions. The second-order approximation results, 
using the property Eq. (10), in a uniform correction: 
5In{L) ~ , where A^spike is the number of spikes 

and iVbins is the number of bins used in estimating the 
probability distribution P(x|spike). Because this correc- 
tion is independent of the direction in the stimulus space, 
it provides a zero contribution to the gradient at the posi- 
tion of the true filter. The second-order terms determine 
the variance of the MID filters on a trial-by-trial basis, 
because while the deviations themselves 5~V ^ V/(L) 
are proportional to the gradient of information, their 
variance (i5Vi(5Vj) (Vi/(L)Vj/(L)) is proportional to 
pairwise gradient correlations. Using Eqs. (11) and (10), 
one can show that the leading term determining this vari- 
ance behaves as dim 1 /A^spikc • The exact coefficient can 
be found in Ref."^°. This means that while different MID 
filters obtained based on different empirical distributions 
deviate from each other and from the true filter, these 
deviations have zero mean and finite variance that de- 
creases as ^ l/A'spikc with increasing number of spikes. 
While there may be terms ~ -^spfke describing a shift in 
the mean, these will be masked by a much larger effect of 
variance between estimates decreasing as N~\^. This is 
what we mean by saying that the MID method is unbi- 
ased. Note that the gradient of information evaluated at 
the filters of the linear model (STA or decorrelated STA) 
will be non-zero, with terms of order 0(1), which do not 
depend on the number of spikes and remain finite even 
in the limit of infinite data. 

However there are ways in which the stimulus ensemble 
can influence the single MID even in a neuron that does 
not adapt, if the relevant subspace (RS) has two or more 
dimensions. In this shown in Ref.— , Appendix 

B, the single MID for that ensemble may include a com- 
ponent outside of the RS if the ensemble is such that the 
average stimulus given the projections along the relevant 
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SUPPLEMENTARY FIG. 5: Coarse evolution of adaptive neural filters. (a,d) Comparison of neural filters derived 
from tlie first lialf (a,d), middle half (b,e) or last half (c,f) of stimulation with noise and natural inputs. Notations are as in 
Fig. 2(a,d). In panels (g,h) we plot only natural filters to show that they overlap. In panels (i,j) we compare three of the 
noise filters derived from the first half of the data (magenta), middle half of the data (yellow), and last of the data (red) to 
the natural filters of the last half of the data. With time, noise neural filters diverge from natural filters. 
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SUPPLEMENTARY FIG. 6; (a) The neural filter derived from the middle half of natural stimulation is apphed to the first 
and last quarter of the natural input ensemble. Notations are as in Fig. 4. The solid line is an exponential fit, dashed lines 
show one standard deviation based on the Jacobian of the fit, p = 0.007. The remaining panels show that the relevant 
statistical properties of the input ensemble are stable and cannot account for the time dependence seen in Fig. 4. Here we 
show the mean and standard deviations (in arbitrary units) for natural and noise input stimuli filtered differently: (b) 
natural stimuli (first half of the data) filtered with natural neural filters computed from second half of the data; (c) natural 
stimuli (all duration) filtered with noise neural filters; (d) noise stimuli (all duration) filtered with natural neural filters; (e) 
noise stimuli (first half of the data) filtered with noise neural filters obtained from the second half. 
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SUPPLEMENTARY FIG. 7; Information carried by the noise filter about the neuron's response, as a function of time after 
exposure to the noise ensemble (a) or natural stimulus ensemble (b). Information values were evaluated along the noise filter 
derived from the second half (a) and from full recording (b) of noise stimulation. No significant time dependence could be 
established. Notations are as in Fig. 4. Left and right blue bars show average information carried by noise filter about 
responses to noise ensemble (taller bar) or natural ensemble (shorter bar). Note that the average information values 
computed for the short time segments for the noise filter applied to the noise ensemble (a) are all smaller than the average 
information computed over the whole noise ensemble (right bar in a). This suggests that these short-time estimates are too 
noisy to be reliable in the case of the noise filter, which may provide another reason that we could observe no trend for the 
noise filter. A similar problem can be seen in (b). Note that a similar problem did not arise for the natural filter (main text, 
figure 4): short-time estimates were equal in size to the estimate over the whole ensemble after adaptation. We used the filter 
from the full recording in (b) (unlike in main text, figure 4, where the same filter was used in (a) and (b) for consistency) 
because the short-time estimates for the filter from the second half of the recording showed an even stronger tendency to 
have low information values; using the full recording helps fight noise and so improves the situation, but not sufficiently. 



dimensions is not a linear function of each projection (as 
can occur for non-Gaussian ensembles). Any such effects, 
however, would be instantaneous and would not yield a 
time-dependence to the calculation of information as in 
Fig. 4. 

Details of stimulus presentation and filter anal- 
ysis. The visual input signals were presented as two- 
dimensional spatiotemporal patterns of light intensities 
on a video monitor with a refresh rate of 120 Hz. The 
frame update rate was 60 Hz in the case of the white 
noise stimulus ensemble and 30 Hz in the case of the nat- 
ural stimulus ensemble (our commercial cameras did not 
provide higher temporal resolution than that of televi- 
sion, which is 30Hz). No corrections were made for the 
camera nonlinear amplitude to intensity transformation 
function. 

The optimal orientation was determined from re- 
sponses to a set of evenly spaced orientations at 10 in- 
tervals, with a spatial frequency of 0.5 cycles/degree and 
a temporal frequency of 2 Hz. The optimal spatial fre- 
quency was derived from responses to a set of moving 
gratings of optimal orientation and variable spatial fre- 
quencies (approximately logarithmically spaced between 
0.1 and 4 cycles/degree). 

Spatial frequency profiles were obtained by taking the 
Fourier transform in time and, with zero-padding to 
32x32, in space. Linear interpolation between pixels of 
the 2D transform was used to derive one-dimensional pro- 



files along the preferred orientation of each cell. Before 
averaging across cells, the spatial frequency profiles of 
individual cells were normalized to unit length across all 
spatial and temporal frequencies. Identical procedures 
were used for receptive fields and stimuli comprising the 
input ensembles (averaging over all three frame subse- 
quences, e.g. 1-2-3, 2-3-4, etc.). 

In Fig. 3, the information / was calculated from jack- 
knife estimates of the filters. For each cell, for either the 
natural or noise ensemble, eight jackknife estimates were 
derived, each from 7/8 of the data with the remaining 
1/8 of the data serving as a test set on which the infor- 
mation was calculated. The mean of these 8 estimates 
was assigned as information / that cell and ensemble, 
-^spikc is calculated from responses to 50-150 repetitions 
of an lis- long segment of the natural or noise ensem- 
ble. Finite-size corrections'^^ were applied to both / and 
/spijjc- As a control for the information calculation, we 
calculated natural MID filters for a series of model sim- 
ple cells with a static filter where the number of spikes 
emitted over the course of the test set varied from 80- 
13,000. The calculated information, of course, decreased 
substantially at low numbers of spikes, but it did so sim- 
ilarly whether the filter was applied to the natural or the 
noise ensemble. There was no significant difference be- 
tween the information about the natural ensemble and 
about the noise ensemble for any choice of nonlinearity, 
that is for any signal-to-noise ratio. 



